Deutsch English Français Italiano |
<ec17a8fd-b59b-4e16-b8a7-2225c6a2a9f2n@googlegroups.com>> View for Bookmarking (what is this?) Look up another Usenet article |
X-Received: by 2002:a05:620a:676:b0:742:9899:98fb with SMTP id a22-20020a05620a067600b00742989998fbmr6407702qkh.7.1680457991980; Sun, 02 Apr 2023 10:53:11 -0700 (PDT) X-Received: by 2002:ac8:7f87:0:b0:3e3:8b32:7c57 with SMTP id z7-20020ac87f87000000b003e38b327c57mr12215826qtj.7.1680457991638; Sun, 02 Apr 2023 10:53:11 -0700 (PDT) Path: not-for-mail!transits!origin3563819495050 Newsgroups: comp.lang.forth Date: Sun, 2 Apr 2023 10:53:11 -0700 (PDT) In-Reply-To: <2023Apr2.143625@mips.complang.tuwien.ac.at> Injection-Info: google-groups.googlegroups.com; posting-host=65.207.89.54; posting-account=I-_H_woAAAA9zzro6crtEpUAyIvzd19b Nntp-Posting-Host: 65.207.89.54 References: <fa6cc06e-bd15-4c1e-84f8-0049c4662f19n@googlegroups.com> <79a13ad4-785f-42d8-b753-24c02d50a4c6n@googlegroups.com> <3b0ba976-a5e7-4d81-a9e3-5acaeda0a923n@googlegroups.com> <f2c60dd3-5e22-4646-9cc5-dc0c819618a8n@googlegroups.com> <a06cca56-081c-42fc-9978-232783790ad1n@googlegroups.com> <78b16959-3631-48bc-8c1d-378d31a98bdcn@googlegroups.com> <2023Apr2.101853@mips.complang.tuwien.ac.at> <7a872c6c-2c48-4fc1-812a-160ca375558dn@googlegroups.com> <2023Apr2.143625@mips.complang.tuwien.ac.at> User-Agent: G2/1.0 Mime-Version: 1.0 Message-Id: <ec17a8fd-b59b-4e16-b8a7-2225c6a2a9f2n@googlegroups.com> Subject: Re: 8 Forth Cores for Real Time Control From: Lorem Ipsum <gnuarm.deletethisbit@gmail.com> Injection-Date: Sun, 02 Apr 2023 17:53:11 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Bytes: 11053 Lines: 188 On Sunday, April 2, 2023 at 9:03:48=E2=80=AFAM UTC-4, Anton Ertl wrote: > Lorem Ipsum <gnuarm.del...@gmail.com> writes: > >On Sunday, April 2, 2023 at 4:53:14=3DE2=3D80=3DAFAM UTC-4, Anton Ertl w= rote:=20 > >> Christopher Lozinski <caloz...@gmail.com> writes:=3D20 > >> >=3D3D20=3D20=20 > >> >> > I do understand how to make a register machine pipelined. I have = no =3D=20 > >ide=3D3D > >> >a how to make a stack machine pipelined. > >> >> How is it any different???=3D3D20=3D20=20 > >> >=3D20=20 > >> >Fetch the instruction,=3D3D20=3D20=20 > >> >fetch the operands,=3D3D20=3D20=20 > >> >do the instruction,=3D3D20=3D20=20 > >> >write the results.=3D3D20=3D20=20 > >> >=3D20=20 > >> >On a stack machine, the operands are already on the stack, and the re= sul=3D=20 > >t i=3D3D=3D20=20 > >> >s written to the stack,=3D3D20 > >> >so there is no opportunity to pipeline those. > >> The way you present it, you have just the same opportunities as for a= =3D20=20 > >> register machine (and of course, also the costs, such as forwarding=3D= 20=20 > >> the result to the input if you want to be able to execute instructions= =3D20=20 > >> back-to-back). And if you do it as a barrel processor, as suggested=3D= 20=20 > >> by Lorem Ipsum, AFAICS you have to do that.=3D20 > >=20 > >I don't know what AFAICS means, > As Far As I Can See.=20 >=20 > >but in a "barrel" processor, as you call it=3D=20 > >, you don't need any special additions to the design to accommodate this= ty=3D=20 > >pe of pipelining, because there is no overlap of processing instructions= of=3D=20 > > a single, virtual processor. The instruction is processed 100% before b= eg=3D=20 > >inning the next instruction. With no overlap, there's no need for "forwa= rd=3D=20 > >ing the result".=3D20=20 >=20 > Yes. My wording was misleading. What I meant: If you want to=20 > implement a barrel processor with a stack architecture, you have to=20 > treat the stack in many respects like a register file, possibly=20 > resulting in a pipeline like above.=20 I'm still not following. I'm not sure what you have to do with the registe= r file, other than to have N of them like all other logic. The stack can b= e implemented in block RAM. A small counter points to the stack being proc= essed at that time. You can only perform one stack read and one write for = each processor per instruction. =20 To make it simple, say it was a 4x design. The four stages could be instru= ction decode, ALU1, ALU2 and final. The instruction fetch happens on the f= inal cycle, as do stack ops. There is no special stack "read", as a stack = always presents the top item and next on stack, but the inputs to the ALU n= eed to be captured at the end of instruction decode in the additional pipel= ine registers. IIRC, in my designs (not pipelined), I had the memory opera= tions a half clock out of step which would be equivalent to doing memory re= ad/write in ALU1 cycle. =20 Some aspects of the stack operations might be pipelined. In my early CPU d= esign, the stack ops were speed limiting to the entire CPU. But this had t= o do with producing over/underflow flags, which were reported in a processo= r status word. This is not an essential part of a stack processor. In the= above example, the stack ops could be split and half done in the instructi= on decode phase. =20 I would expect register ops to be simple and fast enough to not require pip= elining. But the address (register index) calculation might require pipeli= ning. Register CPUs are typically RMW, since the registers have to be sele= cted before being "read". A stack processor can be designed to have it's t= op two elements available, immediately after an stack operation. It's a bi= t like a register machine with dedicated ALU registers. I recall some proc= essors always did ALU ops using one fixed register and a selectable other r= egister. =20 > By contrast, for a single-thread stack-based CPU, what is the=20 > forwarding bypass (i.e., an optimization) of a register machine is the=20 > normal path for the TOS of a stack machine; but not for a barrel=20 > processor with a stack architecture.=20 I guess I simply don't know what you mean by "forwarding bypass". I found = this.=20 https://en.wikipedia.org/wiki/Operand_forwarding But I don't follow that either. This has to do with the data of the two in= struction being related. In the barrel stack processor, each phase of the = processor is an independent instruction stream. So there are no data depen= dencies involving the stack. In a pipelined stack CPU, there very much cou= ld be data dependencies. Every time the stack is adjusted, the CPU would s= tall. =20 > >If you say that, you don't understand what is going on. The only added c= os=3D=20 > >t in a barrel processor, are the added FFs, which are not "added" relati= ve =3D=20 > >to multiple cores. Meanwhile, you have saved all the logic between the F= Fs=3D=20 > >. The amount of additional logic, would be very minimal. So there would = b=3D=20 > >e a large savings in logic overall.=3D20=20 >=20 > The logic added in pipelining depends on what is pipelined (over in=20 > comp.arch Mitch Alsup has explained several times how expensive a=20 > deeply pipelined multiplier is: at some design points it's cheaper to=20 > have two multipliers with half the pipelining that are used in=20 > alternating cycles).=20 If you are talking about adding logic for a pipeline, that is some optimiza= tion you are performing. It's not inherent in the pipelining itself. Pipe= lining only requires that the logic flow be broken into steps by registers.= This reduces the clock cycle time. In a pipeline with independent instru= ction streams, there is no added logic to deal with problems like stalls fr= om data interactions.=20 > In any case, the cost is significant in=20 > transistors, in area and in power; in the early 2000s Intel and AMD=20 > planned to continue their clock race by even deeper pipelining than=20 > they had until then (looking at pipelines with 8 FO4 gate equivalents=20 > per stage), but they found that they had trouble cooling the resulting=20 > CPUs, and so settled on ~16 FO4 gate equivalents per stage.=20 I can't say anything about massive Intel processors. In the small CPUs we = are working with, this problem does not exist, mostly because there is no a= dditional logic, other than the registers and the phase counter.=20 > > How many commercial stack proce=3D=20 > >ssors have you seen in the last 20 years? I know of none. So why bother= =3D=20 > > trying to design a stack processor? =3D20=20 >=20 > My understanding is that this is a project he does for educational=20 > purposes. I think that he can learn something from designing a stack=20 > processor; and if that's not enough, maybe some extension or other.=20 > He may also learn something from designing a barrel processor. But=20 > from designing a barrel processor with a stack architecture, at best=20 > he will learn why that squanders the implementation benefits of a=20 > stack architecture; but without first designing a single-threaded=20 > stack machine, I fear that he would miss that, and would not learn=20 > much about what the difference between stack and register machines=20 > means for the implementation, and he may also miss some interesting=20 > properties of barrel processors. He is talking about building a chip. That doesn't sound like an educationa= l project. If he wants to learn, I think he should design both the regist= er CPU, and a stack CPU. How else to compare the issues of each? =20 So you are suggesting he build both the stack and register machine as non-p= ipelined and as pipelined? How else to learn about all types? =20 How does a barrel stack processor "squander" anything??? He wants to desig= n a chip with eight processors. I'm showing him he can design a single log= ical processor, and pipeline it to work as eight processors. His initial s= tatement was about a real time control CPU for his thesis. That's where th= e barrel processor excels. It provides eight processors in much less logic= than 8 separate processors would take. Multiple processors are often esse= ntial because multitasking on a single processor can have significant limit= ations and place significant burdens on the CPUs and software. =20 I realize this is just a master's thesis, but designing what is in reality,= a simple CPU, doesn't seem to come up to the level required. Using pipeli= ning to implement eight processors in a single CPU architecture would seem = to be a bit more "interesting" project. =20 I've changed a lot since I entered the workplace. Now, I would expect the = student to have done an analysis to determine the requirements for this pro= cessor, and how the unique features of the design contribute to meeting tho= se requirements. In school, I was not taught a single thing about the real= world, other than that digital waveforms were not the smooth, clean signal= ========== REMAINDER OF ARTICLE TRUNCATED ==========